Bootstrapping relation extraction from semantic seeds
نویسنده
چکیده
Information Extraction (IE) is a technology for localizing and classifying pieces of relevant information in unstructured natural language texts and detecting relevant relations among them. This thesis deals with one of the central tasks of IE, i.e., relation extraction. The goal is to provide a general framework that automatically learns mappings between linguistic analyses and target semantic relations, with minimal human intervention. Furthermore, this framework is supposed to support the adaptation to new application domains and new relations with various complexities. The central result is a new approach to relation extraction which is based on a minimally supervised method for automatically learning extraction grammars from a large collection of parsed texts, initialized by some instances of the target relation, called semantic seed. Due to the semantic seed approach, the framework can accommodate new relation types and domains with minimal effort. It supports relations of different arity as well as their projections. Furthermore, this framework is general enough to employ any linguistic analysis tools that provide the required type and depth of analysis. The adaptability and the scalability of the framework is facilitated by the DARE rule representation model which is recursive and compositional. In comparison to other IE rule representation models, e.g., Stevenson and Greenwood (2006), the DARE rule representation model is expressive enough to achieve good coverage of linguistic constructions for finding mentions of the target relation. The powerful DARE rules are constructed via a bottom-up and compositional rule discovery strategy, driven by the semantic seed. The control of the quality of newly acquired knowledge during the bootstrapping process is realized through a ranking and filtering strategy, taking two aspects into account: the domain relevance and the trustworthiness of the origin. A spe-
منابع مشابه
Expanding The Recall Of Relation Extraction By Bootstrapping
Most works on relation extraction assume considerable human effort for making an annotated corpus or for knowledge engineering. Generic patterns employed in KnowItAll achieve unsupervised, highprecision extraction, but often result in low recall. This paper compares two bootstrapping methods to expand recall that start with automatically extracted seeds by KnowItAll. The first method is string ...
متن کاملA Two-stage Bootstrapping Algorithm for Relation Extraction
Bootstrapping has been empirically proved to be a powerful method in learning lexico-syntactic patterns for extracting specific relations such as book-author and organizationheadquarters. However, it is not clear how to adapt this method to extract more general relations such as the employment-organization (EMP-ORG) relation. Relations like EMP-ORG are actually a set of relations which involves...
متن کاملBootstrapping Biomedical Ontologies for Scientific Text using NELL
We describe an open information extraction system for biomedical text based on NELL (the Never-Ending Language Learner) (Carlson et al., 2010), a system designed for extraction from Web text. NELL uses a coupled semi-supervised bootstrapping approach to learn new facts from text, given an initial ontology and a small number of “seeds” for each ontology category. In contrast to previous applicat...
متن کاملGraph-Based Seed Set Expansion for Relation Extraction Using Random Walk Hitting Times
Iterative bootstrapping methods are widely employed for relation extraction, especially because they require only a small amount of human supervision. Unfortunately, a phenomenon known as semantic drift can affect the accuracy of iterative bootstrapping and lead to poor extractions. This paper proposes an alternative bootstrapping method, which ranks relation tuples by measuring their distance ...
متن کاملInformation Extraction from German Patient Records via Hybrid Parsing and Relation Extraction Strategies
German Research Center for AI Institut für Med. Informatik/Charité Stuhlsatzenhausweg 3, 66123 Saarbrücken Hindenburgdamm 30, 12200 Berlin [email protected] { f.mueller, thomas.tolxdorff}@charite.de Abstract In this paper, we report on first attempts and findings to analyzing German patient records, using a hybrid parsing architecture and a combination of two relation extraction strate...
متن کامل